Sequential Classifiers Combination for Text Categorization: An Experimental Study

نویسندگان

  • Zheng Zhang
  • Shuigeng Zhou
  • Aoying Zhou
چکیده

In this paper, we introduce Sequential Classifiers Combination (SCC) into text categorization to improve both the classification effectiveness and classification efficiency of the combined individual classifiers. We apply two classifiers sequentially for experimental study, where the first classifier (called filtering classifier) is used to generate candidate categories for the test document and the second classifier (called deciding classifier) is used to select a final category for the test document from the candidate categories. Experimental results indicate that when combining boosting and kNN methods, the combined classifier outperforms the best one of the two individual classifiers, and in the case of combining Rocchio and kNN methods, the combined classifier performs equally well as kNN while its efficiency is much better than kNN and is close to that of Rocchio.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

KAN and RinSCut: Lazy Linear Classifier and Rank-in-Score Threshold in Similarity-Based Text Categorization

Two important research areas in statistical approaches for automated text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization systems. After researching common techniques in both areas, we describe a lazy linear classifier known as the keyword a...

متن کامل

Multi-label Text Categorization with Model Combination based on F1-score Maximization

Text categorization is a fundamental task in natural language processing, and is generally defined as a multi-label categorization problem, where each text document is assigned to one or more categories. We focus on providing good statistical classifiers with a generalization ability for multi-label categorization and present a classifier design method based on model combination and F1-score ma...

متن کامل

Experimental Assessment of a Threshold Selection Algorithm for Tuning Classifiers in the Field of Hierarchical Text Categorization

Text Categorization is the task of assigning predefined categories to text documents. It can provide conceptual views of document collections and has many important applications in the real world. Nowadays, most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold. Without loss of generality, let us assume ...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

A Comparative Experimental Assessment of a Threshold Selection Algorithm in Hierarchical Text Categorization

Most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold, i.e., on hierarchical text categorization. For solutions of a hierarchical problem that make use of an ensemble of classifiers, the behavior of each classifier typically depends on an acceptance threshold, which turns a degree of membership into a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004